12 research outputs found

    Feature Selection Based on Sequential Orthogonal Search Strategy

    Get PDF
    This thesis introduces three new feature selection methods based on sequential orthogonal search strategy that addresses three different contexts of feature selection problem being considered. The first method is a supervised feature selection called the maximum relevance–minimum multicollinearity (MRmMC), which can overcome some shortcomings associated with existing methods that apply the same form of feature selection criterion, especially those that are based on mutual information. In the proposed method, relevant features are measured by correlation characteristics based on conditional variance while redundancy elimination is achieved according to multiple correlation assessment using an orthogonal projection scheme. The second method is an unsupervised feature selection based on Locality Preserving Projection (LPP), which is incorporated in a sequential orthogonal search (SOS) strategy. Locality preserving criterion has been proved a successful measure to evaluate feature importance in many feature selection methods but most of which ignore feature correlation and this means these methods ignore redundant features. This problem has motivated the introduction of the second method that evaluates feature importance jointly rather than individually. In the method, the first LPP component which contains the information of local largest structure (LLS) is utilized as a reference variable to guide the search for significant features. This method is referred to as sequential orthogonal search for local largest structure (SOS-LLS). The third method is also an unsupervised feature selection with essentially the same SOS strategy but it is specifically designed to be robust on noisy data. As limited work has been reported concerning feature selection in the presence of attribute noise, the third method is thus attempts to make an effort towards this scarcity by further exploring the second proposed method. The third method is designed to deal with attribute noise in the search for significant features, and kernel pre-images (KPI) based on kernel PCA are used in the third method to replace the role of the first LPP component as the reference variable used in the second method. This feature selection scheme is referred to as sequential orthogonal search for kernel pre-images (SOS-KPI) method. The performance of these three feature selection methods are demonstrated based on some comprehensive analysis on public real datasets of different characteristics and comparative studies with a number of state-of-the-art methods. Results show that each of the proposed methods has the capacity to select more efficient feature subsets than the other feature selection methods in the comparative studies

    A comparative effectiveness of hierarchical and nonhierarchical regionalisation algorithms in regionalising the homogeneous rainfall regions

    Get PDF
    Descriptive data mining has been widely applied in hydrology as the regionalisation algorithms to identify the statistically homogeneous rainfall regions. However, previous studies employed regionalisation algorithms, namely agglomerative hierarchical and non-hierarchical regionalisation algorithms requiring post-processing techniques to validate and interpret the analysis results. The main objective of this study is to investigate the effectiveness of the automated agglomerative hierarchical and non-hierarchical regionalisation algorithms in identifying the homogeneous rainfall regions based on a new statistically significant difference regionalised feature set. To pursue this objective, this study collected 20 historical monthly rainfall time-series data from the rain gauge stations located in the Kuantan district. In practice, these 20 rain gauge stations can be categorised into two statistically homogeneous rainfall regions, namely distinct spatial and temporal variability in the rainfall amounts. The results of the analysis show that Forgy K-means non-hierarchical (FKNH), HartiganWong K-means non-hierarchical (HKNH), and Lloyd K-means non-hierarchical (LKNH) regionalisation algorithms are superior to other automated agglomerative hierarchical and non-hierarchical regionalisation algorithms. Furthermore, FKNH, HKNH, and LKNH yielded the highest regionalisation accuracy compared to other automated agglomerative hierarchical and non-hierarchical regionalisation algorithms. Based on the regionalisation results yielded in this study, the reliability and accuracy that assessed the risk of extreme hydro-meteorological events for the Kuantan district can be improved. In particular, the regional quantile estimates can provide a more accurate estimation compared to at-site quantile estimates using an appropriate statistical distribution

    A comparative effectiveness of hierarchical and non-hierarchical regionalisation algorithms in regionalising the homogeneous rainfall regions

    Get PDF
    Descriptive data mining has been widely applied in hydrology as the regionalisation algorithms to identify the statistically homogeneous rainfall regions. However, previous studies employed regionalisation algorithms, namely agglomerative hierarchical and non-hierarchical regionalisation algorithms requiring post-processing techniques to validate and interpret the analysis results. The main objective of this study is to investigate the effectiveness of the automated agglomerative hierarchical and non-hierarchical regionalisation algorithms in identifying the homogeneous rainfall regions based on a new statistically significant difference regionalised feature set. To pursue this objective, this study collected 20 historical monthly rainfall time-series data from the rain gauge stations located in the Kuantan district. In practice, these 20 rain gauge stations can be categorised into two statistically homogeneous rainfall regions, namely distinct spatial and temporal variability in the rainfall amounts. The results of the analysis show that Forgy K-means non-hierarchical (FKNH), Hartigan- Wong K-means non-hierarchical (HKNH), and Lloyd K-means non-hierarchical (LKNH) regionalisation algorithms are superior to other automated agglomerative hierarchical and non-hierarchical regionalisation algorithms. Furthermore, FKNH, HKNH, and LKNH yielded the highest regionalisation accuracy compared to other automated agglomerative hierarchical and non-hierarchical regionalisation algorithms. Based on the regionalisation results yielded in this study, the reliability and accuracy that assessed the risk of extreme hydro-meteorological events for the Kuantan district can be improved. In particular, the regional quantile estimates can provide a more accurate estimation compared to at-site quantile estimates using an appropriate statistical distribution

    A New Maximum Relevance-Minimum Multicollinearity (MRmMC) Method for Feature Selection and Ranking

    Get PDF
    A substantial amount of datasets stored for various applications are often high dimensional with redundant and irrelevant features. Processing and analysing data under such circumstances is time consuming and makes it difficult to obtain efficient predictive models. There is a strong need to carry out analyses for high dimensional data in some lower dimensions, and one approach to achieve this is through feature selection. This paper presents a new relevancy-redundancy approach, called the maximum relevance–minimum multicollinearity (MRmMC) method, for feature selection and ranking, which can overcome some shortcomings of existing criteria. In the proposed method, relevant features are measured by correlation characteristics based on conditional variance while redundancy elimination is achieved according to multiple correlation assessment using an orthogonal projection scheme. A series of experiments were conducted on eight datasets from the UCI Machine Learning Repository and results show that the proposed method performed reasonably well for feature subset selection

    Quantitative Assessment Of Bachelor Mechanical Engineering Programs By Senior Exit Surveys At Universiti Malaysia Pahang

    Get PDF
    Continuous program improvement is an important role to ensure the producing the high quality graduates of the engineering programs. This paper explores the assessment of continuous program improvement of mechanical engineering programs at Universiti Malaysia Pahang. Rapid change in the job market demands made it vital for the engineering education providers to adopt a strategy of continuous quality improvement (CQI) of their academic programs. The exit survey was utilized as a tool to assess the graduate students of Faculty of Mechanical Engineering (FKM), Universiti Malaysia Pahang. The exit survey was covered the student’s impressions towards teaching and learning, skills and knowledge related to Program Outcomes (PO’s), student ratings for lecturer’s contributions, student opinions towards academic resources and overall graduate preparation by FKM. The outcome of assessment processes and how it can be facilitate to improve the mechanical engineering program was also investigated. The survey was conducted during last academic semester (first semester of the year 2008/2009). It is observed that all program outcomes have at least 70% student rating towards skills and knowledge preparation related to the program outcomes. This is confirmed outcome based education strategies when the entire curriculum should mapping to the program outcomes. Based on this survey, approximately more that 76.8% of the respondents are agreed that the mechanical engineering program are suitable and recommended of study to community

    Dual level searching approach for solving multi objective optimisation problems using hybrid particle swarm optimisation and bats echolocation-inspired algorithms

    Get PDF
    A dual level searching approach for multi objective optimisation problems using particle swarm optimisation and modified adaptive bats sonar algorithm is presented. The concept of echolocation of a colony of bats to find prey in the modified adaptive bats sonar algorithm is integrated with the established particle swarm optimisation algorithm. The proposed algorithm incorporates advantages of both particle swarm optimisation and modified adaptive bats sonar algorithm approach to handle the complexity of multi objective optimisation problems. These include swarm flight attitude and swarm searching strategy. The performance of the algorithm is verified through several multi objective optimisation benchmark test functions and problem. The acquired results show that the proposed algorithm perform well to produce a reliable Pareto front. The proposed algorithm can thus be an effective method for solving of multi objective optimisation problems

    The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments

    Get PDF
    Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies

    Identifying the Ideal Number Q-Components of the Bayesian Principal Component Analysis Model for Missing Daily Precipitation Data Treatment

    Get PDF
    The grassroots of the presence of missing precipitation data are due to the malfunction of instruments, error of recording and meteorological extremes. Consequently, an effective imputation algorithm is indeed much needed to provide a high quality complete time series in assessing the risk of occurrence of extreme precipitation tragedy. In order to overcome this issue, this study desired to investigate the effectiveness of various Q-components of the Bayesian Principal Component Analysis model associates with Variational Bayes algorithm (BPCAQ-VB) in missing daily precipitation data treatment, which the ideal number of Q-components is identified by using The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) algorithm. The effectiveness of BPCAQ-VB algorithm in missing daily precipitation data treatment is evaluated by using four distinct precipitation time series, including two monitoring stations located in inland and coastal regions of Kuantan district, respectively. The analysis results rendered the BPCA5-VB is superior in missing daily precipitation data treatment for the coastal region time series compared to the single imputation algorithms proposed in previous studies. Contrarily, the single imputation algorithm is superior in missing daily precipitation data treatment for an inland region time series rather than the BPCAQ-VB algorithm

    A comparative effectiveness of hierarchical and nonhierarchical regionalisation algorithms in regionalising the homogeneous rainfall regions

    Get PDF
    Descriptive data mining has been widely applied in hydrology as the regionalisation algorithms to identify the statistically homogeneous rainfall regions. However, previous studies employed regionalisation algorithms, namely agglomerative hierarchical and non-hierarchical regionalisation algorithms requiring post-processing techniques to validate and interpret the analysis results. The main objective of this study is to investigate the effectiveness of the automated agglomerative hierarchical and non-hierarchical regionalisation algorithms in identifying the homogeneous rainfall regions based on a new statistically significant difference regionalised feature set. To pursue this objective, this study collected 20 historical monthly rainfall time-series data from the rain gauge stations located in the Kuantan district. In practice, these 20 rain gauge stations can be categorised into two statistically homogeneous rainfall regions, namely distinct spatial and temporal variability in the rainfall amounts. The results of the analysis show that Forgy K-means non-hierarchical (FKNH), HartiganWong K-means non-hierarchical (HKNH), and Lloyd K-means non-hierarchical (LKNH) regionalisation algorithms are superior to other automated agglomerative hierarchical and non-hierarchical regionalisation algorithms. Furthermore, FKNH, HKNH, and LKNH yielded the highest regionalisation accuracy compared to other automated agglomerative hierarchical and non-hierarchical regionalisation algorithms. Based on the regionalisation results yielded in this study, the reliability and accuracy that assessed the risk of extreme hydro-meteorological events for the Kuantan district can be improved. In particular, the regional quantile estimates can provide a more accurate estimation compared to at-site quantile estimates using an appropriate statistical distribution
    corecore